Goto

Collaborating Authors

 Tochigi Prefecture


Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression

Kuroki, Kyo, Okoshi, Yasuyuki, Van Chu, Thiem, Kawamura, Kazushi, Motomura, Masato

arXiv.org Artificial Intelligence

This paper proposes a novel matrix quantization method, Binary Quadratic Quantization (BQQ). In contrast to conventional first-order quantization approaches, such as uniform quantization and binary coding quantization, that approximate real-valued matrices via linear combinations of binary bases, BQQ leverages the expressive power of binary quadratic expressions while maintaining an extremely compact data format. We validate our approach with two experiments: a matrix compression benchmark and post-training quantization (PTQ) on pretrained Vision Transformer-based models. Experimental results demonstrate that BQQ consistently achieves a superior trade-off between memory efficiency and reconstruction error than conventional methods for compressing diverse matrix data. It also delivers strong PTQ performance, even though we neither target state-of-the-art PTQ accuracy under tight memory constraints nor rely on PTQ-specific binary matrix optimization. For example, our proposed method outperforms the state-of-the-art PTQ method by up to 2.2\% and 59.1% on the ImageNet dataset under the calibration-based and data-free scenarios, respectively, with quantization equivalent to 2 bits. These findings highlight the surprising effectiveness of binary quadratic expressions for efficient matrix approximation and neural network compression.


QAMA: Scalable Quantum Annealing Multi-Head Attention Operator for Deep Learning

Du, Peng, Shi, Jinjing, Wang, Wenxuan, Ma, Yin, Wen, Kai, Li, Xuelong

arXiv.org Artificial Intelligence

Attention mechanisms underpin modern deep learning, while the quadratic time and space complexity limit scalability for long sequences. To address this, Quantum Annealing Multi-Head Attention (QAMA) is proposed, a novel drop-in operator that reformulates attention as an energy-based Hamiltonian optimization problem. In this framework, token interactions are encoded into binary quadratic terms, and quantum annealing is employed to search for low-energy configurations that correspond to effective attention patterns. Unlike classical sparse or approximate attention methods that rely on hand-crafted heuristics, QAMA allows sparsity structures to emerge naturally from the optimization process. Theoretically, computational complexity is analysed through single-spin flip dynamics, providing time to solution runtime bounds that depend on the spectral properties of the annealing Hamiltonian. Empirically, evaluation on both natural language and vision benchmarks shows that, across tasks, accuracy deviates by at most 2.7 points from standard multi-head attention, while requiring only linear qubits in sequence length. Visualizations further reveal that the Hamiltonian penalty terms induce meaningful and interpretable sparsity across heads. Finally, deployment on a coherent Ising machine validates the feasibility of running QAMA on real quantum hardware, showing tangible inference-time reductions compared with classical implementations. These results highlight QAMA as a pioneering and scalable step toward integrating quantum optimization devices into deep neural architectures, providing a seamlessly integrable and hardware-compatible alternative to conventional attention mechanisms. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.


Quantum Annealing for Minimum Bisection Problem: A Machine Learning-based Approach for Penalty Parameter Tuning

Rusnáková, Renáta, Chovanec, Martin, Gazda, Juraj

arXiv.org Artificial Intelligence

Abstract--The Minimum Bisection Problem is a well-known NP-hard problem in combinatorial optimization, with practical applications in areas such as parallel computing, network design, and machine learning. In this paper, we examine the potential of using D-Wave Systems' quantum annealing solvers to solve the Minimum Bisection Problem, which we formulate as a Quadratic Unconstrained Binary Optimization model. A key challenge in this formulation lies in choosing an appropriate penalty parameter, as it plays a crucial role in ensuring both the quality of the solution and the satisfaction of the problem's constraints. T o address this, we introduce a novel machine learning-based approach for adaptive tuning of the penalty parameter . Specifically, we use a Gradient Boosting Regressor model trained to predict suitable penalty parameter values based on structural properties of the input graph, the number of nodes and the graph's density. This method enables the penalty parameter to be adjusted dynamically for each specific problem instance, improving the solver's ability to balance the competing goals of minimizing the cut size and maintaining equally sized partitions. We test our approach on a large dataset of randomly generated Erd os-R enyi graphs with up to 4000 nodes, and we compare the results with classical partitioning algorithms, Metis and Kernighan-Lin. Experimental findings demonstrate that our adaptive tuning strategy significantly improves the performance of quantum annealing hybrid solver and consistently outperforms the classical methods used, indicating its potential as an alternative for graph partitioning problem. RAPH partitioning is a fundamental problem in combinatorial optimization with applications in task scheduling for multiprocessor computers with focus on parallel computation, partitioning the circuit with applications in microchips design, social network analysis e.g. The problem involves dividing a given graph G into two or more subsets while optimizing certain objective, such as minimizing the number of inter-edges and/or assigned costs between them and producing balanced partitions. While general Graph Partitioning Problem (GPP) allow for flexible partition sizes, a significant special case is the Minimum Bisection Problem (MBP), where the graph is partitioned into two equal-sized subsets while minimizing the number of inter-edges [1].


Sub-universal variational circuits for combinatorial optimization problems

Weitz, Gal, Pira, Lirandë, Ferrie, Chris, Combes, Joshua

arXiv.org Artificial Intelligence

Quantum variational circuits have gained significant attention due to their applications in the quantum approximate optimization algorithm and quantum machine learning research. This work introduces a novel class of classical probabilistic circuits designed for generating approximate solutions to combinatorial optimization problems constructed using two-bit stochastic matrices. Through a numerical study, we investigate the performance of our proposed variational circuits in solving the Max-Cut problem on various graphs of increasing sizes. Our classical algorithm demonstrates improved performance for several graph types to the quantum approximate optimization algorithm. Our findings suggest that evaluating the performance of quantum variational circuits against variational circuits with sub-universal gate sets is a valuable benchmark for identifying areas where quantum variational circuits can excel.


Photonic restricted Boltzmann machine for content generation tasks

Luo, Li, Fang, Yisheng, Zhang, Wanyi, Ruan, Zhichao

arXiv.org Artificial Intelligence

The restricted Boltzmann machine (RBM) is a neural network based on the Ising model, well known for its ability to learn probability distributions and stochastically generate new content. However, the high computational cost of Gibbs sampling in content generation tasks imposes significant bottlenecks on electronic implementations. Here, we propose a photonic restricted Boltzmann machine (PRBM) that leverages photonic computing to accelerate Gibbs sampling, enabling efficient content generation. By introducing an efficient encoding method, the PRBM eliminates the need for computationally intensive matrix decomposition and reduces the computational complexity of Gibbs sampling from $O(N)$ to $O(1)$. Moreover, its non-Von Neumann photonic computing architecture circumvents the memory storage of interaction matrices, providing substantial advantages for large-scale RBMs. We experimentally validate the photonic-accelerated Gibbs sampling by simulating a two-dimensional Ising model, where the observed phase transition temperature closely matches the theoretical predictions. Beyond physics-inspired tasks, the PRBM demonstrates robust capabilities in generating and restoring diverse content, including images and temporal sequences, even in the presence of noise and aberrations. The scalability and reduced training cost of the PRBM framework underscore its potential as a promising pathway for advancing photonic computing in generative artificial intelligence.


Is Quantum Optimization Ready? An Effort Towards Neural Network Compression using Adiabatic Quantum Computing

Wang, Zhehui, Choong, Benjamin Chen Ming, Huang, Tian, Gerlinghoff, Daniel, Goh, Rick Siow Mong, Liu, Cheng, Luo, Tao

arXiv.org Artificial Intelligence

Quantum optimization is the most mature quantum computing technology to date, providing a promising approach towards efficiently solving complex combinatorial problems. Methods such as adiabatic quantum computing (AQC) have been employed in recent years on important optimization problems across various domains. In deep learning, deep neural networks (DNN) have reached immense sizes to support new predictive capabilities. Optimization of large-scale models is critical for sustainable deployment, but becomes increasingly challenging with ever-growing model sizes and complexity. While quantum optimization is suitable for solving complex problems, its application to DNN optimization is not straightforward, requiring thorough reformulation for compatibility with commercially available quantum devices. In this work, we explore the potential of adopting AQC for fine-grained pruning-quantization of convolutional neural networks. We rework established heuristics to formulate model compression as a quadratic unconstrained binary optimization (QUBO) problem, and assess the solution space offered by commercial quantum annealing devices. Through our exploratory efforts of reformulation, we demonstrate that AQC can achieve effective compression of practical DNN models. Experiments demonstrate that adiabatic quantum computing (AQC) not only outperforms classical algorithms like genetic algorithms and reinforcement learning in terms of time efficiency but also excels at identifying global optima.


A 10.8mW Mixed-Signal Simulated Bifurcation Ising Solver using SRAM Compute-In-Memory with 0.6us Time-to-Solution

Dee, Alana Marie, Moazeni, Sajjad

arXiv.org Artificial Intelligence

Combinatorial optimization problems are funda- mental for various fields ranging from finance to wireless net- works. This work presents a simulated bifurcation (SB) Ising solver in CMOS for NP-hard optimization problems. Analog domain computing led to a superior implementation of this algorithm as inherent and injected noise is required in SB Ising solvers. The architecture novelties include the use of SRAM compute-in-memory (CIM) to accelerate bifurcation as well as the generation and injection of optimal decaying noise in the analog domain. We propose a novel 10-T SRAM cell capable of performing ternary multiplication. When measured with 60- node, 50% density, random, binary MAXCUT graphs, this all- to-all connected Ising solver reliably achieves above 93% of the ground state solution in 0.6us with 10.8mW average power in TSMC 180nm CMOS. Our chip achieves an order of magnitude improvement in time-to-solution and power compared to previously proposed Ising solvers in CMOS and other platforms.


Annealing Machine-assisted Learning of Graph Neural Network for Combinatorial Optimization

Loyola, Pablo, Hasegawa, Kento, Hoyos-Idobro, Andres, Ono, Kazuo, Suzumura, Toyotaro, Hirate, Yu, Yamaoka, Masanao

arXiv.org Artificial Intelligence

While Annealing Machines (AM) have shown increasing capabilities in solving complex combinatorial problems, positioning themselves as a more immediate alternative to the expected advances of future fully quantum solutions, there are still scaling limitations. In parallel, Graph Neural Networks (GNN) have been recently adapted to solve combinatorial problems, showing competitive results and potentially high scalability due to their distributed nature. We propose a merging approach that aims at retaining both the accuracy exhibited by AMs and the representational flexibility and scalability of GNNs. Our model considers a compression step, followed by a supervised interaction where partial solutions obtained from the AM are used to guide local GNNs from where node feature representations are obtained and combined to initialize an additional GNN-based solver that handles the original graph's target problem. Intuitively, the AM can solve the combinatorial problem indirectly by infusing its knowledge into the GNN. Experiments on canonical optimization problems show that the idea is feasible, effectively allowing the AM to solve size problems beyond its original limits.


Self-Adaptive Ising Machines for Constrained Optimization

Delacour, Corentin

arXiv.org Artificial Intelligence

Ising machines (IM) are physics-inspired alternatives to von Neumann architectures for solving hard optimization tasks. By mapping binary variables to coupled Ising spins, IMs can naturally solve unconstrained combinatorial optimization problems such as finding maximum cuts in graphs. However, despite their importance in practical applications, constrained problems remain challenging to solve for IMs that require large quadratic energy penalties to ensure the correspondence between energy ground states and constrained optimal solutions. To relax this requirement, we propose a self-adaptive IM that iteratively shapes its energy landscape using a Lagrange relaxation of constraints and avoids prior tuning of penalties. Using a probabilistic-bit (p-bit) IM emulated in software, we benchmark our algorithm with multidimensional knapsack problems (MKP) and quadratic knapsack problems (QKP), the latter being an Ising problem with linear constraints. For QKP with 300 variables, the proposed algorithm finds better solutions than state-of-the-art IMs such as Fujitsu's Digital Annealer and requires 7,500x fewer samples. Our results show that adapting the energy landscape during the search can speed up IMs for constrained optimization.


Who Speaks Next? Multi-party AI Discussion Leveraging the Systematics of Turn-taking in Murder Mystery Games

Nonomura, Ryota, Mori, Hiroki

arXiv.org Artificial Intelligence

Multi-agent systems utilizing large language models (LLMs) have shown great promise in achieving natural dialogue. However, smooth dialogue control and autonomous decision making among agents still remain challenges. In this study, we focus on conversational norms such as adjacency pairs and turn-taking found in conversation analysis and propose a new framework called "Murder Mystery Agents" that applies these norms to AI agents' dialogue control. As an evaluation target, we employed the "Murder Mystery" game, a reasoning-type table-top role-playing game that requires complex social reasoning and information manipulation. In this game, players need to unravel the truth of the case based on fragmentary information through cooperation and bargaining. The proposed framework integrates next speaker selection based on adjacency pairs and a self-selection mechanism that takes agents' internal states into account to achieve more natural and strategic dialogue. To verify the effectiveness of this new approach, we analyzed utterances that led to dialogue breakdowns and conducted automatic evaluation using LLMs, as well as human evaluation using evaluation criteria developed for the Murder Mystery game. Experimental results showed that the implementation of the next speaker selection mechanism significantly reduced dialogue breakdowns and improved the ability of agents to share information and perform logical reasoning. The results of this study demonstrate that the systematics of turn-taking in human conversation are also effective in controlling dialogue among AI agents, and provide design guidelines for more advanced multi-agent dialogue systems.